An Exploration of Analysis Methods on Predictive Models of Student Success¶

Alex Beckwith¶

May 2023¶

Presentation Itinerary¶

  • Introduction
    • Presentation Itinerary
    • Quick Summary
  • Motivations
    • Personal Goals
    • Research Goals
    • Research Questions
  • Previous Research
    • Learning Analytics/Education Data Mining
    • Predicting Student Performance
    • Model Evaluation Methods
  • Experimental Architecture
    • Dataset
    • Feature Extraction
    • Algorithms & Hyperparameters
    • Model Pipeline
  • Model Evaluation
    • Naive Averaging
    • Null Hypothesis Significance Testing (NHST)
    • Bayesian
    • Future Research
  • Wrap Up
    • Questions
    • Tools Used
    • Top References

Quick Summary¶

  • Built a system to train, test, & evaluate machine learning models
  • Applied to educational data from an online university
  • Used system to generate predictions
  • Analyzed results

Motivations¶

Personal Goals¶

  • Research personally relevant topic (education)
  • Apply knowledge of SQL/Python/data from job as data analyst
  • Apply interest/knowledge of predictive models learned independently and in data science program
  • Increase knowledge of statistical evaluation methods

Research Goals¶

  • Evaluate machine learning models using best practices/methods/tooling
  • Determine if Bayesian or Frequentist methods are better for machine learning problems
  • Test new metric for evaluation of model fairness
  • Apply above goals to case study with education dataset

Research Questions¶

  1. Which models and featuresets are best at predicting student outcomes?
  2. How do the results differ when models are compared using naive, frequentist and Bayesian methods?
  3. Is there an association between model predictive performance and Absolute Between Receiver Operating Characteristic Area (ABROCA)?

Previous Research¶

Learning Analytics/Education Data Mining¶

  • Educational Data Mining (EDM) is concerned with developing methods for exploring the unique types of data that come from educational environments
      - It can be also defined as the application of data mining (DM) techniques to this specific type of dataset that come from educational environments to address important educational questions.
  • Learning Analytics (LA) can be defined as the measurement, collection, analysis, and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs (Lang, Siemens, Wise, & Gasevic, 2017). There are three crucial elements involved in this definition data, analysis and action. Educational data mining and learning analytics: An updated survey Cristobal Romero | Sebastian Ventura

Predicting Student Performance¶

  • Within EDM/LA, looked speifically at predicting student performance
  • Important to:
    • detect early to divert resources to students in need
    • Trace knowledge transfer

Most common types of prediction:¶

  1. Classification
  2. Regression
  3. Clustering

Common Predictions:¶

  1. Final outcome
    • Dropout
    • Pass/Fail
  2. Final grades
  3. Deadline compliance

Most common algorithms:¶

  1. Tree-based
    • Decision Tree
    • Random Forest
    • Boosted
  2. Regression
    • Logistic Regression
    • Linear Regression
  3. Support Vector Machines
  4. Bayesian
    • Naive Bayes
  5. K-Nearest-Neighbor
  6. Artificial Neural Networks
  • best performing are typically ensemble methods

Most common student data sources:¶

  1. Computer-based learning environment
    • Massive Open Online Course (MOOCs)
    • Intelligent Tutoring Systems (ITS)
    • Learning Management System
  2. In-person

Online -> More data available data more consistent Blended learning needs more study

Most common feature types:¶

  1. Academic data
    • Assessments
  2. Demographic data
  3. Behavior
    • Virtual learning environment (VLE) interactions
  4. Financial aid data

Feature Extraction Strategy¶

  • Automated vs Expert Engineered vs Crowdsourced
  • Automated can perform better, but often less interpretable
  • AutoML Feature Engineering for Student Modeling Yields High Accuracy, but Limited Interpretability
  • Nigel Bosch - University of Illinois Urbana-Champaign
  • TSFRESH performed better than both, but was most difficult to interpret
  • (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests)
  • Another source suggested crowdsourcing features in addition to

Model Evaluation Methods¶

Naive Averaging¶

  • Sorting and picking top average value
  • Hard to extrapolate
  • Simply sorting by metric and picking top value
  • Difficult to discern differences between models
  • No sense of variability between model types/settings
  • Tough to weigh other factors like interpretability & fit time in justified way

Frequentist (Null Hypothesis Significance Testing)¶

Critical difference diagram based on results from post-hoc Nemenyi tests:

diagram shows regions for which the null hypothesis cannot be rejected

  • Friedman test to show whether groups of results are similar (global test)
  • Post-hoc Nemenyi Test to indicate if significant difference exists between two models
  • Tough to compare large set of models in this way
  • Used non-parametric tests to minimize assumptions of distributions of model data
  • Better for model output data

Bayesian¶

Output from Signed Rank Test
  • Uses Bayesian signed rank test to estimate probability of means being in a prespecified "Region of Practical Equivalence" (ROPE)
  • This test used a ROPE vaue of 0.01, indicating that a 1% difference in means is a wide enough band to consider the performance of two models equivalent for all practical purposes
Output from test on multiple datasets:
  • A Bayesian posterior plot resulting from a Bayesian hierarchical correlated t-test
  • visualizes the results of Markov-Chain Monte Carlo (MCMC) sampling for the comparison of two models X and Y
  • The estimated probability of each outcome is the proportion of samples that fall in each section of the plot.
  • "heavier" than signed rank test, so less convenient
  • baycomp uses hierarchical for multiple datasets, signed rank for single

ABROCA | Slicing Analysis¶

(Absolute Between Receiver Operating Characteristic Area)¶

Receiver Operating Characteristic (ROC Curve)¶

  • A function of the false positive rate to true positive rate over the range of threshold values for a predictor
  • Area under ROC curve (ROC AUC) commonly used as metric to optimize performance of machine learning models.
    • Perfect predictor -> ROC AUC = $1.0$ (correct prediction at all threshold values)
    • Random predictor -> ROC AUC = $0.5$ (equally likely to pick correctly or incorrectly at all threshold values)
  • Link to more math

explain roc originally used to measure performance of radar equipment

  • The Receiver Operating Characteristic (ROC) is a plot of the false positive rate to true positive rate over the range of threshold
  • Area under ROC curve (ROC AUC) commonly used as metric to optimize performance of machine learning models.
    • Perfect predictor -> ROC AUC = $1.0$ (correct prediction at all threshold values)
    • Random predictor -> ROC AUC = $0.5$ (equally likely to pick correctly or incorrectly at all threshold values)

ABROCA | Slicing Analysis¶

(Absolute Between Receiver Operating Characteristic Area)¶

  • Proposed as metric with which to compare predictive model fairness
    • First introduced at 2019 International Learning Analytics and Knowledge Conference
  • To calculate:
    • Split dataset by feature of interest
    • Calculate ROC curves for the model on each part of split dataset
    • Sum absolute values of between-curve area
  • How does this relate to fairness?
    • A model that predicts subgroups of split dataset equally would have ABROCA = 0 (Same ROC curves, so no area between)
    • Hypothesis - Higher ABROCA associated with lower predictive fairness

Dataset¶

The Open University¶

  • Exclusively online university
  • Largest university by enrollment in UK
  • Provision one of the largest public learning analytics datasets

The files¶

  • Open University Learning Analytics Dataset (OULAD)
    • Anonymized using ARX anonymization tool
  • Massive Open Online Courses (MOOCs)
    • 2 years (2013 & 2014)
    • 7 courses
    • 23 presentations
    • 32,593 students
    • 10,655,280 aggregated Virtual Learning Environment (VLE) activity records
  • For course to be included in OULAD
    • The number of students in the selected module-presentation is larger than 500.
    • At least two presentations of the module exist.
    • VLE data are available for the module-presentation (since not all the modules are studied via VLE).
    • The module has a significant number of failing students.
  • (clicks/student/activity/course/day)
  • 7 Tables
    • Course Info
    • Student Info
    • Assessment Info
    • Virtual Learning Environment (VLE) Summaries
      • (clicks per day, per resource, per student)
    • 3 Bridge Tables

can add course level details as notes

compared 2013 & 2014 data with sample from 2015 to see if significant changes in deomgraphics at a significance level of 0.05, none would be rejected

Age (age_band)¶

OULAD (Red & 2013-2014), vs 2015 data (Blue)
part of effort to anonymize data - big bins
Describing first30.all_features.age_band

Proportions:
    value  frequency  proportion
0   0-35      16590    0.694956
1  35-55       7091    0.297043
2   55<=        191    0.008001 

Null count: 0
count     23872
unique        3
top        0-35
freq      16590
Name: age_band, dtype: object 

Index of Multiple Deprivation (imd_band)¶

Example Map of IMD

In the current English Indices of Deprivation 2019 (IoD2019) seven domains of deprivation are considered and weighted as follows,

  • Income. (22.5%)
  • Employment. (22.5%)
  • Education. (13.5%)
  • Health. (13.5%)
  • Crime. (9.3%)
  • Barriers to Housing and Services. (9.3%)
  • Living Environment. (9.3%)
OULAD (Red & 2013-2014), vs 2015 data (Blue)

lower = more deprived maybe update data hists to combine

Describing first30.all_features.imd_band

Proportions:
       value  frequency  proportion
0    20-30%       2639    0.110548
1    30-40%       2539    0.106359
2    40-50%       2314    0.096934
3    50-60%       2316    0.097017
4    60-70%       2171    0.090943
5    70-80%       2180    0.091320
6    80-90%       2063    0.086419
7   90-100%       2001    0.083822
8      None          0    0.000000
9     0-10%       2303    0.096473
10    10-20       2406    0.100788 

Null count: 940
count      22932
unique        10
top       20-30%
freq        2639
Name: imd_band, dtype: object 

Region (region)¶

Describing first30.all_features.region

Proportions:
                    value  frequency  proportion
0   North Western Region       2032    0.085121
1               Scotland       2701    0.113145
2           South Region       2278    0.095426
3                Ireland        938    0.039293
4                  Wales       1642    0.068784
5      South East Region       1559    0.065307
6      South West Region       1753    0.073433
7   West Midlands Region       1845    0.077287
8       Yorkshire Region       1440    0.060322
9    East Anglian Region       2434    0.101960
10  East Midlands Region       1680    0.070375
11         London Region       2223    0.093122
12          North Region       1347    0.056426 

Null count: 0
count        23872
unique          13
top       Scotland
freq          2701
Name: region, dtype: object 

Highest Education¶

Describing first30.all_features.highest_education

Proportions:
                          value  frequency  proportion
0           Lower Than A Level       9187    0.384844
1        A Level or Equivalent      10541    0.441563
2             HE Qualification       3660    0.153318
3              No Formal quals        225    0.009425
4  Post Graduate Qualification        259    0.010850 

Null count: 0
count                     23872
unique                        5
top       A Level or Equivalent
freq                      10541
Name: highest_education, dtype: object 

Course Domain¶

  • STEM or Social Studies
Describing first30.all_features.is_stem

Proportions:
    value  frequency  proportion
0      1      16147    0.676399
1      0       7725    0.323601 

Null count: 0
count    23872.000000
mean         0.676399
std          0.467860
min          0.000000
25%          0.000000
50%          1.000000
75%          1.000000
max          1.000000
Name: is_stem, dtype: float64 

Final Result¶

Describing first30.all_features.final_result

Proportions:
          value  frequency  proportion
0         Fail       5728    0.239946
1         Pass      10857    0.454801
2    Withdrawn       4811    0.201533
3  Distinction       2476    0.103720 

Null count: 0
count     23872
unique        4
top        Pass
freq      10857
Name: final_result, dtype: object 

Experimental Architecture¶

Data Processing/Analysis¶

Initial Database Schemas¶

  • PostgresQL
  • Landing
    • Raw CSV load
  • Staging
    • Datatype and naming standardization
  • Main
    • Data architecture optimization
    • Categorical/text columns stored in tables linked with integer foreign keys
    • Joined data saved in views
Showing 10 rows from "landing"."studentInfo"
code_module code_presentation id_student gender region highest_education imd_band age_band num_of_prev_attempts studied_credits disability final_result
0 AAA 2013J 11391 M East Anglian Region HE Qualification 90-100% 55<= 0 240 N Pass
1 AAA 2013J 28400 F Scotland HE Qualification 20-30% 35-55 0 60 N Pass
2 AAA 2013J 30268 F North Western Region A Level or Equivalent 30-40% 35-55 0 60 Y Withdrawn
3 AAA 2013J 31604 F South East Region A Level or Equivalent 50-60% 35-55 0 60 N Pass
4 AAA 2013J 32885 F West Midlands Region Lower Than A Level 50-60% 0-35 0 60 N Pass
5 AAA 2013J 38053 M Wales A Level or Equivalent 80-90% 35-55 0 60 N Pass
6 AAA 2013J 45462 M Scotland HE Qualification 30-40% 0-35 0 60 N Pass
7 AAA 2013J 45642 F North Western Region A Level or Equivalent 90-100% 0-35 0 120 N Pass
8 AAA 2013J 52130 F East Anglian Region A Level or Equivalent 70-80% 0-35 0 90 N Pass
9 AAA 2013J 53025 M North Region Post Graduate Qualification None 55<= 0 60 N Pass
  • Landing
    • Raw CSV load
    • so much text
    • 3 columns for unique row
  • Staging
    • Datatype and naming standardization
  • Main [Maybe ERD]
    • Data architecture optimization
    • Categorical/text columns stored in tables linked with integer foreign keys
    • Joined data saved in views
Showing 10 rows from main.student_info
id orig_student_id course_id module_id presentation_id age_band_id imd_band_id highest_education_id region_id final_result_id is_female has_disability date_registration date_unregistration studied_credits num_of_prev_attempts
0 32588 3733 10 4 2 3 10 2 9 4 0 0 -68 -8.0 60 0
1 22291 6516 2 1 4 3 9 2 7 3 0 0 -52 NaN 60 0
2 32538 8462 9 4 4 3 4 2 4 4 0 0 -38 18.0 60 1
3 32539 8462 10 4 2 3 4 2 4 4 0 0 -137 119.0 90 0
4 22281 11391 1 1 2 3 10 2 1 3 0 0 -159 NaN 240 0
5 4461 23629 4 2 1 1 3 3 1 2 1 0 -47 NaN 60 2
6 26116 23632 3 2 2 1 5 1 1 4 1 0 -194 -51.0 60 0
7 14301 23698 7 3 4 1 6 1 1 3 1 0 -110 NaN 120 0
8 994 23798 3 2 2 1 6 1 11 1 0 0 -27 NaN 60 0
9 11595 24186 20 7 3 1 2 3 13 3 1 1 -25 NaN 30 0
  • Landing
    • Raw CSV load
  • Staging
    • Datatype and naming standardization
  • Main [Maybe ERD]
    • Data architecture optimization
    • Categorical/text columns stored in tables linked with integer foreign keys
    • Joined data saved in views
    • all ints or floats

Feature Extraction¶

  • Categories
    • Demographic Info
    • Course Info
    • VLE Interaction Data
    • Assignment Data
  • Demographic Info
  • Course Info
    • course level
    • course subject
  • VLE Interaction Data
  • Assignment Data
    • n assignments created/assigned
    • calculated moments about the mean for the number of days early or late students turned in assignments
  • Agg
    • Aggregations and calculations
  • Feat
    • First pass at organizing features/calculations for predictive models
  • First30
    • Version of Feat created using first 30 days of class data
    • Excluded if withdrew before class day 30
  • Agg
    • Aggregations and calculations
    • [Avg Assignment Days Early by N Days Active]
  • Feat
    • First pass at organizing features/calculations for predictive models
    • [N Days Active]
    • [N Distinct Top 5th by Visits]
  • First30
    • Version of Feat created using first 30 days of class data
    • Captures 49.60% of all withdrawn students
    • Captures 72.22% of students who withdrew after class started
    • Soon enough to make actionable difference to most withdrawing/failing students
    • [Final Result]
  • Model
    • Logging of model execution data
  • Eval
    • Organization of model analysis calculations

example of aggregated feature

example of expert-rec engineered feature

Classification Algorithms & Hyperparameters¶

  • Grid Search
    • Created large arrays of available hyperparameters
    • Brute-force search through combination of available hyperparameters
  • Random Search
    • Used GridSearch to limit the bounds of hyperparameter settings
    • Created random variables to represent distribution of particular hyperparamters, limited by results from GridSearch
    • Ran models where each iteration would pick from a model's available parameter combinations and distributions

first will show gridsearch then all examples will be generated randomly

Example of GridSearch Cross-Validation for hxg_boost:
clf__learning_rate [0.1]
clf__random_state [None]


clf__learning_rate [0.01]
clf__random_state [None]


clf__learning_rate [0.001]
clf__random_state [None]


note incremental changes random state is a way to freeze a random generator seed only recommended during dev because of "seed optimization"

Decision Tree (dtree/DT)¶

  • Simple decision rules are optimized from features to sort data

point out model type code and code for in visual

Example of RandomizedSearch Cross-Validation for dtree:
clf__splitter ['best']
clf__random_state [None]
clf__min_samples_split [61]
clf__min_samples_leaf [8]
clf__max_features ['sqrt']
clf__max_depth [16]
clf__criterion ['log_loss']


Ada Boost (ada_boost/ADA)¶

  • Ensemble Method
  • Fits on original dataset, then creates copies which weight incorrectly classified instances more heavily in sequential cycles
  • Used Decision Tree as base estimator, but can use many
Example of RandomizedSearch Cross-Validation for ada_boost:
clf__learning_rate [0.009607840680411647]
clf__random_state [None]


Histogram Gradient-Boosting (hxg_boost/HGB)¶

  • Similar to Ada Boost, but correction based on gradient of loss function from residuals (gradient descent)
  • Dataset large enough that Histogram Gradient-Boosting Classifier much faster than Regular Gradient-Boosting Classifier
  • Histograms increase training efficiency by bucketing continuous features
Example of RandomizedSearch Cross-Validation for hxg_boost:
clf__interaction_cst ['no_interactions']
clf__l2_regularization [0.3040953370005851]
clf__learning_rate [0.03922058903998265]
clf__max_bins [195]
clf__max_depth [45]
clf__max_iter [74]
clf__min_samples_leaf [8]
clf__random_state [None]
clf__warm_start [True]


Random Forest (rforest/RF)¶

  • Ensemble Method
  • Fits many decision trees on sub-samples of dataset, then uses averaging to boost accuracy and control over-fitting
Example of RandomizedSearch Cross-Validation for rforest:
clf__bootstrap [True]
clf__criterion ['entropy']
clf__max_features ['log2']
clf__max_samples [0.12272488337757193]
clf__min_samples_leaf [5]
clf__min_samples_split [9]
clf__n_estimators [52]
clf__n_jobs [-1]
clf__oob_score [True]
clf__random_state [None]


Extra Trees (etree/ET)¶

  • Ensemble Method
  • Fits many decision trees on sub-samples of dataset, then uses averaging to boost accuracy and control over-fitting
Example of RandomizedSearch Cross-Validation for etree:
clf__bootstrap [True]
clf__criterion ['gini']
clf__max_features ['sqrt']
clf__max_samples [0.1400021301694605]
clf__min_samples_leaf [9]
clf__min_samples_split [5]
clf__n_estimators [103]
clf__n_jobs [-1]
clf__oob_score [True]
clf__random_state [None]


Extra Trees vs Random Forest¶

  • Both construct many decision trees during execution & avg for classification/regression
  • RF uses bootstrapping to sample subsets, ET by default does not
  • RF looks for best split, ET randomly selects split
  • ET typically will have faster fit times & lower variance, higher bias
  • Performance of ET vs RF is often conditional upon feature selection/noisiness

K-Nearest Neighbor (knn/KNN)¶

  • Calculates most likely value based on proximity to other points in numeric space
Example of RandomizedSearch Cross-Validation for knn:
clf__weights ['distance']
clf__p [1]
clf__n_neighbors [4]
clf__n_jobs [-1]
clf__leaf_size [56]
clf__algorithm ['ball_tree']


Logistic Regression (logreg/LOG)¶

  • Calculates most likely value based on contribution of independednt variables
Example of RandomizedSearch Cross-Validation for logreg:
clf__C [0.3234846473870656]
clf__penalty ['l2']
clf__random_state [None]
clf__solver ['liblinear']


Multi-Layer Perceptron (mlp/MLP)¶

  • Simple (vanilla) neural network
  • Consists of layers of connected nodes with activation functions
  • Optimizes weights of nodes in each layer using backpropogation during training
  • Last layer is output layer, which produces most likely result given trained inputs
Example of RandomizedSearch Cross-Validation for mlp:
clf__activation ['logistic']
clf__alpha [0.0014749079430959996]
clf__early_stopping [False]
clf__hidden_layer_sizes [165]
clf__learning_rate ['invscaling']
clf__learning_rate_init [0.008378338921501449]
clf__max_iter [63]
clf__power_t [0.02097768921985744]
clf__random_state [None]
clf__solver ['sgd']


Support Vector Machines (svc/SVC)¶

  • A hyperplane is optimized to best split the data into different spatial regions
Example of RandomizedSearch Cross-Validation for svc:
clf__C [0.08503983103378839]
clf__degree [2]
clf__gamma ['scale']
clf__kernel ['rbf']
clf__probability [True]
clf__random_state [None]


Not Implemented¶

  • Others considered but not implemented due to data preprocessing changes necessary/compute/memory overhead
  • Gaussian
    • (Blew up RAM)
  • Naive Bayes
    • (Would need to preprocess data differently)
  • Gradient Boosting
    • (Histogram-Based Algorithm more efficient at this scale)
Example of RandomizedSearch Cross-Validation for compnb:
clf__alpha [0.0032502548421112333]
clf__norm [True]


Model Pipeline¶

Data Preprocessing¶

  • Categorical Data -> One Hot
  • Boolean Data -> Bit $\left( True = 1, False = 0 \right)$
  • Numeric Data -> Standardized $\left( \mu = 0, \sigma = 1 \right)$
  • Imputing Strategy = Constant = $0$
  • Variance Threshold = $0$
  • Dim Reduction = Principal Component Analysis w/ Maximum Likelihood Estimation

replaced all missing values with zeros to avoid removing important data - not always the best strategy Standardized -> 0 mean, 1 var

Model Training Settings¶

  • Cross Validation Type = Repeated Stratified K Fold
  • Cross Validation Splits $ = 5$
  • Cross Validation Repeats $ = 2$
  • Runs per Model $ = 10$
  • Refit Parameter = ROC AUC
  • Feature to Predict = "is_withdraw_or_fail"
  • Train-Test Split Ratio = $[0.25, 0.35]$

Model Evaluation¶

Naive Averaging¶

model_type mean_fit_time std_fit_time mean_score_time std_score_time mean_test_roc_auc std_test_roc_auc
0 rforest 43.546772 6.621612 0.308331 0.079489 0.773876 0.006709
1 etree 22.712446 2.100217 0.221903 0.027837 0.771116 0.004962
2 hxg_boost 8.673935 0.686707 0.297604 0.038754 0.770126 0.004659
3 mlp 13.990430 0.143472 0.077275 0.011556 0.769375 0.007105
4 hxg_boost 6.469628 1.027840 0.263747 0.042775 0.769049 0.006249
5 etree 14.824491 3.315720 0.251914 0.041687 0.767965 0.004917
6 hxg_boost 1.569689 0.177629 0.086444 0.013947 0.767947 0.005761
7 mlp 1.304684 0.044562 0.029652 0.002250 0.767719 0.008325
8 rforest 21.928703 4.551270 0.763771 0.412899 0.767640 0.005182
9 hxg_boost 1.494005 0.194275 0.075681 0.012553 0.767497 0.004604

drive home the point that just sorting by average is very limiting

NHST¶

  • Significance Value = 0.05
  • recall used non parametric friedman test to check for global difference then used pairwise nemenyi

Bayesian¶

  • remem used bayesian signed rank test top check for rope (in this case, rope = 0.002)
  • models using all features made better predictions than those with one category or more
  • (this study got those results, another found just assignment data better)

ABROCA¶

  • compares the ABROCA performance of two models
  • two models = logreg and etree
  • for this run, etree has on avg better predictive performance
    • follows previous research - more data, better predictions
  • on abroca, similar for disability, logreg better on gender balance
  • future - baycomp on abroca as metric to analyze
  • follows research -> weak/no relationship between ABROCA and performance (measured by ROC AUC)
  • follows research -> quadratic relationship between ABROCA & demographic balance
  • (not necessary to sacrifice predictive performance while researching model fairness)
  • makes sense because metric modulated is 2D area
  • & models can be expected to perform worse with less training data

Future Research¶

  • More comprehensive metric evaulation of ABROCA
    • Statistics
    • Other demographic characteristics
  • Refinement of feature extraction
  • Automated optimization/analysis of hyperparamter probability distributions
  • Explore relationship between mathematical properties of ROC & ABROCA
  • Expend more computing resources on hierarchical comparisons rather than different parameterizations

Wrap Up¶

Questions?¶

Tools Used¶

Scripting Language of Choice
Notebook Engine
Database System
Database Communication
Database Communication & Data Manipulation
Visualizations
Visualizations
Array Computation

Bayesian Statistical Tests

baycomp¶

by:

  • Janez Demsar
  • Alessio Benavoli
  • Giorgio Corani
Random Variables & Statistical Tests
Machine Learning Models & Components

Top References¶

  • Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis
  • Evaluating the Fairness of Predictive Student Models Through Slicing Analysis
  • Evaluating Predictive Models of Student Success: Closing the Methodological Gap
  • Exploring the Link between Online Behaviours and Course Performance in Asynchronous Online High School Courses
  • Educational data mining and learning analytics: An updated survey